{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to Use CleanVision"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cleanlab/cleanvision/blob/main/docs/source/tutorials/tutorial.ipynb) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden",
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install -U pip\n",
    "!pip install git+https://github.com/cleanlab/cleanvision.git"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "Use `pip install cleanvision` to install a stable release of the package.\n",
    "\n",
    "**After you install these packages, you may need to restart your notebook runtime before running the rest of this notebook.**"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is CleanVision?\n",
    "**CleanVision** is built to automatically detects various issues in image datasets, such as images that are: (near) duplicates, blurry, over/under-exposed, etc. This data-centric AI package is designed as a quick first step for any computer vision project to find problems in your dataset, which you may want to address before applying machine learning.\n",
    "\n",
    "\n",
    "|     | Issue Type      | Description                                                                                  | Issue Key        |\n",
    "|-----|------------------|----------------------------------------------------------------------------------------------|------------------|\n",
    "| 1   | Light            | Images that are too bright/washed out in the dataset                                         | light            |\n",
    "| 2   | Dark             | Images that are irregularly dark                                                             | dark             |\n",
    "| 3   | Odd Aspect Ratio | Images with an unusual aspect ratio (i.e. overly skinny/wide)                                                       | odd_aspect_ratio |\n",
    "| 4   | Exact Duplicates | Images that are exact duplicates of each other                          | exact_duplicates |\n",
    "| 5   | Near Duplicates  | Images that are almost visually identical to each other (e.g. same image with different filters)                                 | near_duplicates  |\n",
    "| 6   | Blurry           | Images that are blurry or out of focus                                                  | blurry           |\n",
    "| 7   | Grayscale        | Images that are grayscale (lacking color)                                                            | grayscale        |\n",
    "| 8   | Low Information  | Images that lack much information (e.g. a completely black image with a few white dots) | low_information  |\n",
    "| 9   | Odd Size  | Images that are abnormally large or small compared to the rest of the dataset | odd_size  |\n",
    "\n",
    "\n",
    "The **Issue Key** column specifies the name for each type of issue in CleanVision code. See our examples which use these keys to detect only particular issue types and specify nondefault parameter settings to use when checking for certain issues."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook uses an example dataset, that you can download using these commands."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "!wget - nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'\n",
    "!unzip -q image_files.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```shell\n",
    "wget - nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'\n",
    "unzip -q image_files.zip\n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Using CleanVision to detect issues in your dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from cleanvision import Imagelab\n",
    "\n",
    "# Path to your dataset, you can specify your own dataset path\n",
    "dataset_path = \"./image_files/\"\n",
    "\n",
    "# Initialize imagelab with your dataset\n",
    "imagelab = Imagelab(data_path=dataset_path)\n",
    "\n",
    "# Find issues\n",
    "imagelab.find_issues()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `report()` method helps you quickly understand the major issues detected in the dataset. It reports the number of images in the dataset that exhibit each type of issue, and shows example images corresponding to the most severe instances of each issue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.report()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The main way to interface with your data is via the [Imagelab](../cleanvision/imagelab.rst#cleanvision.imagelab.Imagelab) class. This class can be used to understand the issues in your dataset at a high level (global overview) and low level (issues and quality scores for each image) as well as additional information about the dataset. It has three main attributes:\n",
    "\n",
    "- `Imagelab.issue_summary`\n",
    "- `Imagelab.issues`\n",
    "- `Imagelab.info`\n",
    "\n",
    "#### imagelab.issue_summary\n",
    "This is a Dataframe containing a comprehensive summary of all detected issue types within your dataset, along with their respective prevalence levels. Each row in this summary includes the following information:\n",
    "\n",
    "`issue_type`: The name of the detected issue.\\\n",
    "`num_images`: The number of images exhibiting the identified issue within the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.issue_summary"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### imagelab.issues\n",
    "\n",
    "DataFrame assessing each image in your dataset, reporting which issues each image exhibits and a quality score for each type of issue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.issues.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is a Boolean column for each issue type, showing whether each image exhibits that type of issue or not. For example,  the rows where the `is_dark_issue` column contains `True`, those rows correspond to images that appear too **dark**.\n",
    "\n",
    "For the **dark** issue type (and more generally for other types of issues), there is a numeric column `dark_score`, which assesses how severe this issue is in each image. These quality scores lie between 0 and 1, where lower values indicate more severe instances of the issue (images which are darker in this example).\n",
    "\n",
    "One use-case for `imagelab.issues` is to filter out all images exhibiting  one particular type of issue and rank them by their quality score. Here's how to get all blurry images ranked by their `blurry_score`, note lower scores indicate higher severity:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "blurry_images = imagelab.issues[imagelab.issues[\"is_blurry_issue\"] == True].sort_values(\n",
    "    by=[\"blurry_score\"]\n",
    ")\n",
    "blurry_image_files = blurry_images.index.tolist()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "Visualize the blurry images"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.visualize(image_files=blurry_image_files[:4])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A shorter way to accomplish the above task is to specify an issue type in `imagelab.visualize()`. This will show images ordered by the severity of this issue within them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.visualize(issue_types=[\"blurry\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### imagelab.info\n",
    "\n",
    "This is a nested dictionary containing statistics about the images and other miscellaneous information stored while checking for issues in the dataset. Beware: this dictionary may be large and poorly organized (it is only intended for advanced users).\n",
    "\n",
    "Possible keys in this dict are **statistics** and a key corresponding to each issue type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.info.keys()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "`imagelab.info['statistics']` is also a dict containing statistics calculated on images while checking for issues in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.info[\"statistics\"].keys()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "You can see **size** statistics for the dataset below. Here we observe, both the 25th and 75th percentile are 256 for the dataset, hence images that are further away from this range are detected as oddly sized."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.info[\"statistics\"][\"size\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Duplicate sets\n",
    "`imagelab.info` can also be used to retrieve which images are near or exact  duplicates of each other. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`issue.summary` shows the number of exact duplicate images but does not show how many such *sets* of duplicates images exist in the dataset. To see the number of exact duplicate sets, you can use `imagelab.info`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.info[\"exact_duplicates\"][\"num_sets\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also get exactly which images are there in each (exact/near) duplicated set using `imagelab.info`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.info[\"exact_duplicates\"][\"sets\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**The rest of this notebook demonstrates more advanced/customized workflows you can do with CleanVision.**"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Using CleanVision to detect specific issues"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It might be the case that only a few issue types are relevant for your dataset and you don't want to run it through all checks to save time. You can do so by specifying `issue_types` as an argument.\n",
    "\n",
    "`issue_types` is a dict, where keys are the issue types that you want to detect and values are dict which contains hyperparameters. This example uses default hyperparameters, in which case you can leave the hyperparameter dict empty. To find keys for issue types check the above table that lists all issue types supported by CleanVision. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize imagelab with your dataset\n",
    "imagelab = Imagelab(data_path=dataset_path)\n",
    "\n",
    "# specify issue types to detect\n",
    "issue_types = {\"dark\": {}}\n",
    "\n",
    "# Find issues\n",
    "imagelab.find_issues(issue_types)\n",
    "\n",
    "# Show a report of the issues found\n",
    "imagelab.report()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Check for additional types of issues using the same instance"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose you also want to check for blurry images  after having already detected dark images in the dataset. You can use the **same** Imagelab instance to incrementally check for another type of issue like  blurry images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "issue_types = {\"blurry\": {}}\n",
    "\n",
    "imagelab.find_issues(issue_types)\n",
    "\n",
    "imagelab.report()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Save and load"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CleanVision also has a save and load functionality that you can use to save the results and load them at a later point in time to see results or run more checks."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For saving, specify `force=True` to overwrite existing files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "save_path = \"./results\"\n",
    "imagelab.save(save_path)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For loading a saved instance, specify `dataset_path` to help check for any inconsistencies between dataset paths in the previous and current run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab = Imagelab.load(save_path, dataset_path)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Check for an issue with a different threshold"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use the loaded imagelab instance to check for an issue type with a custom hyperparameter. Here is a table of hyperparameters that each issue type supports and their permissible values. \n",
    "\n",
    "`threshold`- All images with scores below this threshold will be flagged as an issue.\n",
    "\n",
    "`hash_size` - This controls how much detail about an image we want to keep for getting perceptual hash. Higher sizes imply more detail.\n",
    "\n",
    "`hash_type` - Type of perceptual hash to use. Currently `whash`, `phash`, `ahash`, `dhash` and `chash` are the supported hash types. Check [here](https://github.com/JohannesBuchner/imagehash) for more details on these hash types.\n",
    "\n",
    "|   | Issue Key        | Hyperparameters                                   |\n",
    "|---|------------------|---------------------------------------------------|\n",
    "| 1 | light            | threshold (between 0 and 1)                       |\n",
    "| 2 | dark             | threshold (between 0 and 1)                       |\n",
    "| 3 | odd_aspect_ratio | threshold (between 0 and 1)                       |\n",
    "| 4 | exact_duplicates | N/A                                               |\n",
    "| 5 | near_duplicates  | hash_size (int, only power of 2 for whash), hash_types (whash, phash, ahash, dhash, chash) |\n",
    "| 6 | blurry           | threshold (between 0 and 1)                       |\n",
    "| 7 | grayscale        | threshold (between 0 and 1)                       |\n",
    "| 8 | low_information  | threshold (between 0 and 1)                       |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "issue_types = {\"dark\": {\"threshold\": 0.2}}\n",
    "imagelab.find_issues(issue_types)\n",
    "\n",
    "imagelab.report()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note the number of images with dark issue has reduced from the previous run."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6. Run CleanVision for default issue types, but override hyperparameters for one or more issues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab = Imagelab(data_path=dataset_path)\n",
    "\n",
    "# Check for all default issue types\n",
    "imagelab.find_issues()\n",
    "\n",
    "# Specify an issue with custom hyperparameters\n",
    "issue_types = {\"odd_aspect_ratio\": {\"threshold\": 0.2}}\n",
    "\n",
    "# Run find issues again with specified issue types\n",
    "imagelab.find_issues(issue_types)\n",
    "\n",
    "\n",
    "# Pass list of issue_types to imagelab.report() to report only those issue_types\n",
    "imagelab.report([\"odd_aspect_ratio\", \"low_information\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7. Customize report"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Report can also be customized in various ways to help with the analysis. For example, you can change the `num_images` to show for each issue type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Change verbosity\n",
    "imagelab.report(issue_types=[\"dark\"], num_images=10)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may want to exclude issues from your report which are prevalent in say more than 50% of the dataset and are not real issues but just how the dataset is, for example dark images in an astronomy dataset may not be an issue. You can use the `max_prevalence` parameter in report to exclude such issues. In this example all issues present in more than 3% of the dataset are excluded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.report(max_prevalence=0.03)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8. Visualize specific issues"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Imagelab provides `imagelab.visualize` that you can use to see examples of specific issues in your dataset.\n",
    "\n",
    "`num_images` and `cell_size` are optional arguments, that you can use to control number of examples of each issue type and size of each image in the grid respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "issue_types = [\"grayscale\"]\n",
    "imagelab.visualize(issue_types=issue_types, num_images=3, cell_size=(3, 3))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced: Create your own issue type"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also create a custom issue type by extending the base class [IssueManager](../cleanvision/utils/base_issue_manager.rst#cleanvision.utils.base_issue_manager.IssueManager). CleanVision can then detect your custom issue along with other pre-defined issues in any image dataset! Here's an example of a custom issue manager, which can also be found [here](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/custom_issue_manager.py)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Any, Dict, List, Optional\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from PIL import Image\n",
    "from tqdm.auto import tqdm\n",
    "\n",
    "from cleanvision.dataset.base_dataset import Dataset\n",
    "from cleanvision.issue_managers import register_issue_manager\n",
    "from cleanvision.utils.base_issue_manager import IssueManager\n",
    "from cleanvision.utils.utils import get_is_issue_colname, get_score_colname\n",
    "\n",
    "ISSUE_NAME = \"custom\"\n",
    "\n",
    "\n",
    "@register_issue_manager(ISSUE_NAME)\n",
    "class CustomIssueManager(IssueManager):\n",
    "    \"\"\"\n",
    "    Example class showing how you can self-define a custom type of issue that\n",
    "    CleanVision can simultaneously check your data for alongside its built-in issue types.\n",
    "    \"\"\"\n",
    "\n",
    "    issue_name: str = ISSUE_NAME\n",
    "    visualization: str = \"individual_images\"\n",
    "\n",
    "    def __init__(self) -> None:\n",
    "        super().__init__()\n",
    "        self.params = self.get_default_params()\n",
    "\n",
    "    def get_default_params(self) -> Dict[str, Any]:\n",
    "        return {\"threshold\": 0.4}\n",
    "\n",
    "    def update_params(self, params: Dict[str, Any]) -> None:\n",
    "        self.params = self.get_default_params()\n",
    "        non_none_params = {k: v for k, v in params.items() if v is not None}\n",
    "        self.params = {**self.params, **non_none_params}\n",
    "\n",
    "    @staticmethod\n",
    "    def calculate_mean_pixel_value(image: Image.Image) -> float:\n",
    "        gray_image = image.convert(\"L\")\n",
    "        return np.mean(np.array(gray_image))\n",
    "\n",
    "    def get_scores(self, raw_scores: List[float]) -> \"np.ndarray[Any, Any]\":\n",
    "        scores = np.array(raw_scores)\n",
    "        return scores / 255.0\n",
    "\n",
    "    def mark_issue(self, scores: pd.Series, threshold: float) -> pd.Series:\n",
    "        return scores < threshold\n",
    "\n",
    "    def update_summary(self, summary_dict: Dict[str, Any]) -> None:\n",
    "        self.summary = pd.DataFrame({\"issue_type\": [self.issue_name]})\n",
    "        for column_name, value in summary_dict.items():\n",
    "            self.summary[column_name] = [value]\n",
    "\n",
    "    def find_issues(\n",
    "        self,\n",
    "        *,\n",
    "        params: Optional[Dict[str, Any]] = None,\n",
    "        dataset: Optional[Dataset] = None,\n",
    "        imagelab_info: Optional[Dict[str, Any]] = None,\n",
    "        **kwargs: Any,\n",
    "    ) -> None:\n",
    "        super().find_issues(**kwargs)\n",
    "        assert params is not None\n",
    "        assert imagelab_info is not None\n",
    "        assert dataset is not None\n",
    "\n",
    "        self.update_params(params)\n",
    "\n",
    "        raw_scores = []\n",
    "        for idx in tqdm(dataset.index):\n",
    "            image = dataset[idx]\n",
    "            raw_scores.append(self.calculate_mean_pixel_value(image))\n",
    "\n",
    "        score_colname = get_score_colname(self.issue_name)\n",
    "        is_issue_colname = get_is_issue_colname(self.issue_name)\n",
    "\n",
    "        scores = pd.DataFrame(index=dataset.index)\n",
    "        scores[score_colname] = self.get_scores(raw_scores)\n",
    "\n",
    "        is_issue = pd.DataFrame(index=dataset.index)\n",
    "        is_issue[is_issue_colname] = self.mark_issue(\n",
    "            scores[score_colname], self.params[\"threshold\"]\n",
    "        )\n",
    "\n",
    "        self.issues = pd.DataFrame(index=dataset.index)\n",
    "        self.issues = self.issues.join(scores)\n",
    "        self.issues = self.issues.join(is_issue)\n",
    "\n",
    "        self.info[self.issue_name] = {\"PixelValue\": raw_scores}\n",
    "        summary_dict = self._compute_summary(\n",
    "            self.issues[get_is_issue_colname(self.issue_name)]\n",
    "        )\n",
    "\n",
    "        self.update_summary(summary_dict)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 9. Run CleanVision with a custom issue"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab = Imagelab(data_path=dataset_path)\n",
    "\n",
    "issue_name = CustomIssueManager.issue_name\n",
    "\n",
    "\n",
    "# To ensure your issue manager is registered, check list of possible issue types\n",
    "# issue_name should be present in this list\n",
    "imagelab.list_possible_issue_types()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "issue_types = {issue_name: {}}\n",
    "imagelab.find_issues(issue_types)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.report()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Beyond the collection of image files demonstrated here, you can alternatively run CleanVision on: [Hugging Face datasets](huggingface_dataset.ipynb), [torchvision datasets](torchvision_dataset.ipynb), as well as [files in cloud storage buckets like S3, GCS, or Azure](https://github.com/cleanlab/cleanvision-examples/blob/main/cloud_dataset.ipynb)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
